String hashing for linear probing
نویسنده
چکیده
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5-universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5-universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5-universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32-bit integers, e.g., 64-bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al.
منابع مشابه
Linear Probing with 5-wise Independence
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free...
متن کاملA unified approach to linear probing hashing
We give a unified analysis of linear probing hashing with a general bucket size. We use both a combinatorial approach, giving exact formulas for generating functions, and a probabilistic approach, giving simple derivations of asymptotic results. Both approaches complement nicely, and give a good insight in the relation between linear probing and random walks. A key methodological contribution, ...
متن کاملTabulation Based 5-Universal Hashing and Linear Probing
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed 4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5universal without any other change to the comp...
متن کاملDesign and Analysis of Hashing Algorithms with Cache Eeects
This paper investigates the performance of hashing algorithms by both an experimental and an analytical approach. We examine the performance of three classical hashing algorithms: chaining, double hashing and linear probing. Our experimental results show that, despite the theoretical superiority of chaining and double hashing, linear probing outperforms both for random lookups. We explore varia...
متن کاملWear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket
We study wear-leveling techniques for cuckoo hashing, showing that it is possible to achieve a memory wear bound of log logn + O(1) after the insertion of n items into a table of sizeCn for a suitable constantC using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically, showing that it significantly improves on the memory wear performance for classic cuckoo hashing and linea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009